"CV"¶

Capstone Project at General Assembly:

  • Training and deploying computer vision model(s)
  • Detection objective: frontal face images
  • Classification objective: left vs right eye images

Foreword¶

If you allow a few paragraphs, I would like to share a few thoughts on how my capstone reached this state of equilibrium you are seeing today.

Background.
This project is the de facto "culmination" of just over 3 months 
of coursework at General Assembly.

As a digital native generally familiar with what data is, but having fairly pedestrian competency (in coding, structuring, manipulating, and generating insights), I figured it would be fun to build something that hopefully is useful for fellow learners to accelerate their learning.

Ideally:

"Something useful" $\bigcap$ "my (currently) rather broad fields of interest"

Seeking something reasonably challenging (from a learning rather than data acquisition perspective) with practical real world applications, I settled on CV (computer vision).

In learning and execution, I realized there are many code references one can take to fulfill project requirements. However, there are also many hidden challenges, from troubleshooting GPU usage to deprecated references. For those relatively new to the field, it is worthwhile to "hack" your way to success. Do set time limits so you don't get sucked into "black holes"!

I hope the below offers reasonable breadth on the topic and accelerates your learning of computer vision. Feel free to share your feedback!


tl;dr¶

This section doubles as both summary and content navigator.

An annex is also provided to document some other experiments done as part of this project (what filters do, and the beginnings of a sliding window mechanism).

Learning objectives¶

  1. Acquire data and build an image data pipeline
  2. Clean and augment image data.
  3. Train and deploy a computer vision model.
    1. Image Classification (CNN)
    2. Using Pretrained Models (Viola Jones, VGG16)
    3. Transfer Learning (VGG16 custom--failed)

Outcomes¶

  1. Data and Exploratory Analysis
  2. Model 1: Viola-Jones face detection
  3. Model 2: Sequential CNN model with custom dropout layer
  4. Model 3 (failed): Custom VGG16 model
  5. Bibliography

Metrics / scoring¶

For this exercise, I was not overly fussed about getting metrics / scoring in good order.

With computer vision (CV), it is fairly easy to tell if the image is classified correctly or not, or whether objects were detected.

From research, some common metrics used are listed below for CV cases:

  • Classification (Precision, Recall, AUROC, F1)
  • Object Detection (mAP, IoU)

Project Start!¶

Imports¶

In [1]:
# pip install pandas
In [2]:
# pip install opencv-python
In [3]:
# pip install matplotlib
In [4]:
# pip install seaborn
In [5]:
import tensorflow as tf
import os, warnings
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import glob
plt.style.use('seaborn')

from PIL import Image
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow import keras
from tensorflow.keras.applications import VGG16
In [6]:
print(tf.__version__)
2.6.0
In [7]:
# GPU check; to use your GPUs, tf-gpu should be installed
# access jupyter notebook from tf-gpu session
# in anaconda prompt: conda activate tf-gpu
print("Num GPUs Available: ",
      len(tf.config.list_physical_devices('GPU')))
Num GPUs Available:  1
In [8]:
# # Reference: https://www.tensorflow.org/guide/gpu#setup
# tf.debugging.set_log_device_placement(True)

# # Create some tensors (place on CPU)
# with tf.device('/CPU:0'):
#     a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
#     b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# # Run on GPU
# c = tf.matmul(a, b)

# print(c)
In [9]:
# gpus = tf.config.list_physical_devices('GPU')
# if gpus:
#   # Restrict TensorFlow to only use the first GPU
#   try:
#     tf.config.set_visible_devices(gpus[0], 'GPU')
#     logical_gpus = tf.config.list_logical_devices('GPU')
#     print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
#   except RuntimeError as e:
#     # Visible devices must be set before GPUs have been initialized
#     print(e)

Data¶

Acquisition: Directly downloaded from source (Kaggle CelebFaces Attributes (CelebA) Dataset)

(Note: The full dataset will not be uploaded onto GitHub. Please download it separately if needed.)

Info from source:

  • imgalignceleba.zip: All the face images, cropped and aligned
  • listevalpartition.csv: Recommended partitioning of images into training, validation, testing sets. Images 1-162770 are training, 162771-182637 are validation, 182638-202599 are testing
  • listbboxceleba.csv: Bounding box information for each image. "x1" and "y1" represent the upper left point coordinate of bounding box. "width" and "height" represent the width and height of bounding box
  • listlandmarksalign_celeba.csv: Image landmarks and their respective coordinates. There are 5 landmarks: left eye, right eye, nose, left mouth, right mouth
  • listattrceleba.csv: Attribute labels for each image. There are 40 attributes. "1" represents positive while "-1" represents negative

Useful links:

  • https://cs230.stanford.edu/blog/datapipeline/
  • https://towardsdatascience.com/8-common-data-structures-every-programmer-must-know-171acf6a1a42
  • https://datascience.stackexchange.com/questions/29223/exploratory-data-analysis-with-image-datset
In [10]:
# read in dataset
celeb_folder_path = './kaggle_celeb_images/'
dataset = tf.keras.preprocessing.image_dataset_from_directory(
    directory=celeb_folder_path
)
Found 202600 files belonging to 1 classes.
In [11]:
dataset.class_names
Out[11]:
['img_align_celeba']
In [12]:
dataset.take(1)
Out[12]:
<TakeDataset shapes: ((None, 256, 256, 3), (None,)), types: (tf.float32, tf.int32)>

Exploratory Data Analysis¶

In [13]:
# BGR to RGB function
def convert_rgb(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Sample images from "CelebA" dataset

In [14]:
for images, labels in dataset.take(1):
    for img in range(9):
        ax = plt.subplot(3, 3, img + 1)
        plt.imshow(images[img].numpy().astype('uint8'))
        plt.title(f'class {int(labels[img])}, {images[img].shape}')
        plt.axis("off")

Images in dataset conform to only one class (celebrity) and are of shape (256, 256, 3).

Attributes table¶

In [15]:
df_celeb_attributes = pd.read_csv('./kaggle_celeb_images/list_attr_celeba.csv')
In [16]:
df_celeb_attributes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 202599 entries, 0 to 202598
Data columns (total 41 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   image_id             202599 non-null  object
 1   5_o_Clock_Shadow     202599 non-null  int64 
 2   Arched_Eyebrows      202599 non-null  int64 
 3   Attractive           202599 non-null  int64 
 4   Bags_Under_Eyes      202599 non-null  int64 
 5   Bald                 202599 non-null  int64 
 6   Bangs                202599 non-null  int64 
 7   Big_Lips             202599 non-null  int64 
 8   Big_Nose             202599 non-null  int64 
 9   Black_Hair           202599 non-null  int64 
 10  Blond_Hair           202599 non-null  int64 
 11  Blurry               202599 non-null  int64 
 12  Brown_Hair           202599 non-null  int64 
 13  Bushy_Eyebrows       202599 non-null  int64 
 14  Chubby               202599 non-null  int64 
 15  Double_Chin          202599 non-null  int64 
 16  Eyeglasses           202599 non-null  int64 
 17  Goatee               202599 non-null  int64 
 18  Gray_Hair            202599 non-null  int64 
 19  Heavy_Makeup         202599 non-null  int64 
 20  High_Cheekbones      202599 non-null  int64 
 21  Male                 202599 non-null  int64 
 22  Mouth_Slightly_Open  202599 non-null  int64 
 23  Mustache             202599 non-null  int64 
 24  Narrow_Eyes          202599 non-null  int64 
 25  No_Beard             202599 non-null  int64 
 26  Oval_Face            202599 non-null  int64 
 27  Pale_Skin            202599 non-null  int64 
 28  Pointy_Nose          202599 non-null  int64 
 29  Receding_Hairline    202599 non-null  int64 
 30  Rosy_Cheeks          202599 non-null  int64 
 31  Sideburns            202599 non-null  int64 
 32  Smiling              202599 non-null  int64 
 33  Straight_Hair        202599 non-null  int64 
 34  Wavy_Hair            202599 non-null  int64 
 35  Wearing_Earrings     202599 non-null  int64 
 36  Wearing_Hat          202599 non-null  int64 
 37  Wearing_Lipstick     202599 non-null  int64 
 38  Wearing_Necklace     202599 non-null  int64 
 39  Wearing_Necktie      202599 non-null  int64 
 40  Young                202599 non-null  int64 
dtypes: int64(40), object(1)
memory usage: 63.4+ MB
In [17]:
df_celeb_attributes.head(3).T
Out[17]:
0 1 2
image_id 000001.jpg 000002.jpg 000003.jpg
5_o_Clock_Shadow -1 -1 -1
Arched_Eyebrows 1 -1 -1
Attractive 1 -1 -1
Bags_Under_Eyes -1 1 -1
Bald -1 -1 -1
Bangs -1 -1 -1
Big_Lips -1 -1 1
Big_Nose -1 1 -1
Black_Hair -1 -1 -1
Blond_Hair -1 -1 -1
Blurry -1 -1 1
Brown_Hair 1 1 -1
Bushy_Eyebrows -1 -1 -1
Chubby -1 -1 -1
Double_Chin -1 -1 -1
Eyeglasses -1 -1 -1
Goatee -1 -1 -1
Gray_Hair -1 -1 -1
Heavy_Makeup 1 -1 -1
High_Cheekbones 1 1 -1
Male -1 -1 1
Mouth_Slightly_Open 1 1 -1
Mustache -1 -1 -1
Narrow_Eyes -1 -1 1
No_Beard 1 1 1
Oval_Face -1 -1 -1
Pale_Skin -1 -1 -1
Pointy_Nose 1 -1 1
Receding_Hairline -1 -1 -1
Rosy_Cheeks -1 -1 -1
Sideburns -1 -1 -1
Smiling 1 1 -1
Straight_Hair 1 -1 -1
Wavy_Hair -1 -1 1
Wearing_Earrings 1 -1 -1
Wearing_Hat -1 -1 -1
Wearing_Lipstick 1 -1 -1
Wearing_Necklace -1 -1 -1
Wearing_Necktie -1 -1 -1
Young 1 1 1
In [18]:
df_celeb_attributes.describe()
Out[18]:
5_o_Clock_Shadow Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Big_Lips Big_Nose Black_Hair Blond_Hair ... Sideburns Smiling Straight_Hair Wavy_Hair Wearing_Earrings Wearing_Hat Wearing_Lipstick Wearing_Necklace Wearing_Necktie Young
count 202599.000000 202599.000000 202599.00000 202599.000000 202599.000000 202599.000000 202599.000000 202599.000000 202599.000000 202599.000000 ... 202599.000000 202599.000000 202599.000000 202599.000000 202599.00000 202599.000000 202599.000000 202599.000000 202599.000000 202599.000000
mean -0.777728 -0.466039 0.02501 -0.590857 -0.955113 -0.696849 -0.518408 -0.530935 -0.521498 -0.704016 ... -0.886979 -0.035839 -0.583196 -0.360866 -0.62215 -0.903079 -0.055129 -0.754066 -0.854570 0.547234
std 0.628602 0.884766 0.99969 0.806778 0.296241 0.717219 0.855135 0.847414 0.853255 0.710186 ... 0.461811 0.999360 0.812333 0.932620 0.78290 0.429475 0.998482 0.656800 0.519338 0.836982
min -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 ... -1.000000 -1.000000 -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
25% -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 ... -1.000000 -1.000000 -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000 1.000000
50% -1.000000 -1.000000 1.00000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 ... -1.000000 -1.000000 -1.000000 -1.000000 -1.00000 -1.000000 -1.000000 -1.000000 -1.000000 1.000000
75% -1.000000 1.000000 1.00000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 ... -1.000000 1.000000 -1.000000 1.000000 -1.00000 -1.000000 1.000000 -1.000000 -1.000000 1.000000
max 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000

8 rows × 40 columns

In [19]:
plt.figure(figsize=(16,10))
attr_corrs = df_celeb_attributes.drop(columns='image_id').corr()
sns.heatmap(attr_corrs,
            cmap='bone', vmax=1, vmin=-1,
            mask=np.triu(np.ones_like(attr_corrs, dtype=bool)));
plt.title('Celebrity Attributes Correlation Map', fontsize=18)
plt.savefig('./graphics/attribute_corr.png')

Bounding boxes table¶

As will be seen from below, the bounding boxes table for CelebA dataset likely were with reference to their original image frames, using a centroid point with width and height data.

In [20]:
# read in data
df_celeb_bbox = pd.read_csv('./kaggle_celeb_images/list_bbox_celeba.csv')
In [21]:
# check for nulls
df_celeb_bbox.isnull().sum()
Out[21]:
image_id    0
x_1         0
y_1         0
width       0
height      0
dtype: int64
In [22]:
# view data
df_celeb_bbox.head(3)
Out[22]:
image_id x_1 y_1 width height
0 000001.jpg 95 71 226 313
1 000002.jpg 72 94 221 306
2 000003.jpg 216 59 91 126
In [23]:
# plot scatter for (x_1, y_1)
plt.scatter(df_celeb_bbox['x_1'], df_celeb_bbox['y_1'], 
            marker='x', alpha=0.1);
plt.title('bbox (x_1, y_1) scatter', fontsize=18)
plt.savefig('./graphics/bbox_xy_scatter.png')
In [24]:
# plot scatter for width and height
plt.scatter(df_celeb_bbox['width'], df_celeb_bbox['height'], 
            marker='x', alpha=0.1);
plt.title('bbox width, height scatter', fontsize=18)
plt.savefig('./graphics/bbox_widthheight_scatter.png')

Eval Partition table¶

In [25]:
df_celeb_eval_partition = pd.read_csv('./kaggle_celeb_images/list_eval_partition.csv')
In [26]:
plt.plot(df_celeb_eval_partition['partition']);
plt.title('eval_partition', fontsize=18)
plt.savefig('./graphics/eval_partition.png')
In [27]:
df_celeb_landmarks = pd.read_csv('./kaggle_celeb_images/list_landmarks_align_celeba.csv')
In [28]:
df_celeb_landmarks.head(3)
Out[28]:
image_id lefteye_x lefteye_y righteye_x righteye_y nose_x nose_y leftmouth_x leftmouth_y rightmouth_x rightmouth_y
0 000001.jpg 69 109 106 113 77 142 73 152 108 154
1 000002.jpg 69 110 107 112 81 135 70 151 108 153
2 000003.jpg 76 112 104 106 108 128 74 156 98 158
In [29]:
df_celeb_landmarks.isnull().sum()
Out[29]:
image_id        0
lefteye_x       0
lefteye_y       0
righteye_x      0
righteye_y      0
nose_x          0
nose_y          0
leftmouth_x     0
leftmouth_y     0
rightmouth_x    0
rightmouth_y    0
dtype: int64
In [30]:
# check if landmarks are bounded within (256, 256) frame
df_celeb_landmarks.describe() < 256
Out[30]:
lefteye_x lefteye_y righteye_x righteye_y nose_x nose_y leftmouth_x leftmouth_y rightmouth_x rightmouth_y
count False False False False False False False False False False
mean True True True True True True True True True True
std True True True True True True True True True True
min True True True True True True True True True True
25% True True True True True True True True True True
50% True True True True True True True True True True
75% True True True True True True True True True True
max True True True True True True True True True True
In [31]:
# read in first image
drawing = convert_rgb(cv2.imread('./kaggle_celeb_images/img_align_celeba/img_align_celeba/000001.jpg'))
# draw in landmarks
# eyes
cv2.line(drawing,
         pt1=(df_celeb_landmarks['lefteye_x'][0],
              df_celeb_landmarks['lefteye_y'][0]),
         pt2=(df_celeb_landmarks['righteye_x'][0],
              df_celeb_landmarks['righteye_y'][0]),
         color=(255, 255, 0), thickness=1)
eye_width = 25
eye_height = 15
cv2.rectangle(drawing,
              pt1=(df_celeb_landmarks['lefteye_x'][0] - int(eye_width/2),
                   df_celeb_landmarks['lefteye_y'][0] - int(eye_height/2)),
              pt2=(df_celeb_landmarks['lefteye_x'][0] + int(eye_width/2),
                   df_celeb_landmarks['lefteye_y'][0] + int(eye_height/2)),
              color=(255, 0, 0), thickness=1)
cv2.rectangle(drawing,
              pt1=(df_celeb_landmarks['righteye_x'][0] - int(eye_width/2),
                   df_celeb_landmarks['righteye_y'][0] - int(eye_height/2)),
              pt2=(df_celeb_landmarks['righteye_x'][0] + int(eye_width/2),
                   df_celeb_landmarks['righteye_y'][0] + int(eye_height/2)),
              color=(255, 0, 0), thickness=1)
# mouth
cv2.line(drawing,
         pt1=(df_celeb_landmarks['leftmouth_x'][0],
              df_celeb_landmarks['leftmouth_y'][0]),
         pt2=(df_celeb_landmarks['rightmouth_x'][0],
              df_celeb_landmarks['rightmouth_y'][0]),
         color=(0, 255, 255), thickness=1)
mouth_height = 15
cv2.rectangle(drawing,
              pt1=(df_celeb_landmarks['leftmouth_x'][0],
                   df_celeb_landmarks['leftmouth_y'][0]),
              pt2=(df_celeb_landmarks['rightmouth_x'][0],
                   df_celeb_landmarks['rightmouth_y'][0] + mouth_height),
              color=(0, 0, 255), thickness=1)
# only one (x, y) coordinate for nose
nose_width = 20
nose_height = 30
cv2.rectangle(drawing,
              pt1=(df_celeb_landmarks['nose_x'][0] - int(nose_width/2),
                   df_celeb_landmarks['nose_y'][0] - int(nose_height * 0.8)),
              pt2=(df_celeb_landmarks['nose_x'][0] + int(nose_width/2),
                   df_celeb_landmarks['nose_y'][0] + int(nose_height * 0.2)),
              color=(0, 255, 0), thickness=1)
plt.axis('off')
plt.imshow(drawing);
plt.savefig('./graphics/face_with_boxes.png')

Landmarks data for each frontal face image indicate:

  • the centroids for left & right eye (yellow line; boxed in red),
  • the tip of the nose (boxed in green), and
  • the left and right corners of the mouth (cyan line; boxed in blue).

Building facial landmarks dataset¶

To keep things simple, we will pull only eye features, cropping them into a 32 x 32 frame.

Train set: Left eyes (1000 images)¶

In [32]:
for i in range(1,1001):
    img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
    eye_width = 32
    eye_height = 32
    x = df_celeb_landmarks['lefteye_x'][i - 1] - int(eye_width/2)
    y = df_celeb_landmarks['lefteye_y'][i - 1] - int(eye_height/2)
    crop_img = img[y : y + eye_height,
                   x : x + eye_width]
    cv2.imwrite(f'./data/lefteye/lefteye_{str(i).zfill(6)}.jpg', crop_img)
In [33]:
display_count = 15
ncols_display = 5

for i in range(1, display_count + 1):
    ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
    ax.imshow(convert_rgb(cv2.imread(f'./data/lefteye/lefteye_{str(i).zfill(6)}.jpg')))
    ax.set_title(f'{i}')
    ax.axis('off')
plt.suptitle(f'Left eyes: First {display_count} images', fontsize=18);
plt.tight_layout(pad=2)
# plt.savefig('./graphics/left_eyes_15.png')

left_eyes

Train set: Right eyes (1000 images)¶

In [34]:
for i in range(1,1001):
    img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
    eye_width = 32
    eye_height = 32
    x = df_celeb_landmarks['righteye_x'][i - 1] - int(eye_width/2)
    y = df_celeb_landmarks['righteye_y'][i - 1] - int(eye_height/2)
    crop_img = img[y : y + eye_height,
                   x : x + eye_width]
    cv2.imwrite(f'./data/righteye/righteye_{str(i).zfill(6)}.jpg', crop_img)
In [35]:
display_count = 15
ncols_display = 5

for i in range(1, display_count + 1):
    ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
    ax.imshow(convert_rgb(cv2.imread(f'./data/righteye/righteye_{str(i).zfill(6)}.jpg')))
    ax.set_title(f'{i}')
    ax.axis('off')
plt.suptitle(f'Right eyes: First {display_count} images', fontsize=18);
plt.tight_layout(pad=2)
# plt.savefig('./graphics/right_eyes_15.png')

right eyes

Building test set¶

In [36]:
for i in range(1001,1101):
    img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
    eye_width = 32
    eye_height = 32
    x = df_celeb_landmarks['lefteye_x'][i - 1] - int(eye_width/2)
    y = df_celeb_landmarks['lefteye_y'][i - 1] - int(eye_height/2)
    crop_img = img[y : y + eye_height,
                   x : x + eye_width]
    cv2.imwrite(f'./test/lefteye_{str(i).zfill(6)}.jpg', crop_img)
    
for i in range(1001,1101):
    img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
    eye_width = 32
    eye_height = 32
    x = df_celeb_landmarks['righteye_x'][i - 1] - int(eye_width/2)
    y = df_celeb_landmarks['righteye_y'][i - 1] - int(eye_height/2)
    crop_img = img[y : y + eye_height,
                   x : x + eye_width]
    cv2.imwrite(f'./test/righteye_{str(i).zfill(6)}.jpg', crop_img)

Model 1¶

Viola-Jones object detection framework¶

The Viola-Jones framework—well known for frontal face detection—is used in face detection features of apps like Snapchat (see article).

Back to top

Summary:

  1. Images are computationally expensive to process; the framework contributes a new way of representing the image: as an integral image (also related: summed-area-tables).
    • Computing Haar-like features, which are used for detection, can hence be done in constant time, i.e. $O(1)$ (check out Big O notation). This means that time taken to compute does not depend on input size.
  2. The framework also leverages the AdaBoost process in feature selection for "fast classification".
    • This is important because Haar-like features per image are greater than number of pixels in the image.
    • In the framework, features, rather than pixels, are used directly. Simple rectangle features are used.
  3. The paper also describes how to focus attention on "promising regions of the image". This also has wide-reaching implications for practical use.
    • Using a trained classifier, the number of locations to pay attention to for more complex detection is reduced by more than half.
    • Sub-windows that do not "pass" are not processed further.

Reference / Paper: Viola, P. & Jones, M. (2001). "Robust Real-time Object Detection". Second International Workshop on Statistical and Computational Theories of Vision -- Modeling, Learning, Computing, and Sampling. Vancouver, Canada, July 13, 2001. LINK

Useful summary article: LINK

Integral images:¶

  • Value at point (x, y) = Sum (pixels above and to left of point)

Useful readings

  • https://theailearner.com/tag/cv2-integral/
  • https://levelup.gitconnected.com/the-integral-image-4df3df5dce35
In [37]:
#  Loading the image to be tested
test_image = cv2.imread('./kaggle_celeb_images/img_align_celeba/img_align_celeba/000001.jpg')
# Converting to grayscale as opencv expects detector takes in input gray scale images
test_image_gray = cv2.cvtColor(test_image, cv2.COLOR_BGR2GRAY)
print(f'===== Test Image, {type(test_image_gray)} =====')
print(test_image_gray)
print('===== Shape =====')
print(test_image_gray.shape)
# Displaying grayscale image
plt.axis(False)
plt.imshow(test_image_gray, cmap='gray');
===== Test Image, <class 'numpy.ndarray'> =====
[[233 233 233 ... 232 241 241]
 [233 233 233 ... 234 241 241]
 [233 233 233 ... 236 241 242]
 ...
 [ 88  63  93 ...  72  73  73]
 [ 77  85 113 ...  66  68  68]
 [115 151 192 ...  66  68  68]]
===== Shape =====
(218, 178)
In [38]:
# getting to an integral image using cv2
int_img_cv2 = cv2.integral(test_image_gray)
display(int_img_cv2.shape, int_img_cv2)
(219, 179)
array([[      0,       0,       0, ...,       0,       0,       0],
       [      0,     233,     466, ...,   31180,   31421,   31662],
       [      0,     466,     932, ...,   62286,   62768,   63250],
       ...,
       [      0,   38151,   75925, ..., 5598109, 5636022, 5674017],
       [      0,   38228,   76087, ..., 5614686, 5652667, 5690730],
       [      0,   38343,   76353, ..., 5631455, 5669504, 5707635]],
      dtype=int32)

The below is a class diagram by César de Souza in Csharp when building up the framework.

Souza's Class Diagram

Reference: https://www.codeproject.com/Articles/441226/Haar-feature-Object-Detection-in-Csharp

While some time was invested in trying to build something similar from scratch using Python, it was decidedly something out of reach for now.

Below showcases how the Viola Jones object detection framework is executed using a downloaded cascade classifier (from the OpenCV Github).

In [39]:
# Source: https://github.com/opencv/opencv/tree/master/data
haar_cascade_face = cv2.CascadeClassifier(
    './opencv_data/haarcascades/haarcascade_frontalface_alt2.xml'
)
In [40]:
def detect_faces(cascade, test_image, scaleFactor = 1.1):
    # create a copy of the image to prevent any changes to the original one.
    image_copy = test_image.copy()
    
    #convert the test image to gray scale as opencv face detector expects gray images
    gray_image = cv2.cvtColor(image_copy, cv2.COLOR_BGR2GRAY)
    
    # Applying the haar classifier to detect faces
    faces_rect = cascade.detectMultiScale(
        gray_image,
        scaleFactor=scaleFactor, 
        minNeighbors=5)
    
    detection_count = 0
    for (x, y, w, h) in faces_rect:
        cv2.rectangle(
            img=image_copy,
            pt1=(x, y),
            pt2=(x+w, y+h),
            color=(0, 255, 0),
            thickness=2)
        detection_count += 1
        
    print(f'{detection_count} faces detected')
    return image_copy
In [41]:
faces_rects = haar_cascade_face.detectMultiScale(
    test_image_gray,
    scaleFactor = 1.2,
    minNeighbors = 5);

# Let us print the no. of faces found
print('Faces found: ', len(faces_rects))
Faces found:  1
In [42]:
for (x,y,w,h) in faces_rects:
     cv2.rectangle(test_image, (x, y), (x+w, y+h), (0, 255, 0), 2)
In [43]:
#convert image to RGB and show image
plt.imshow(convert_rgb(test_image));
In [44]:
#loading image
test_image2 = cv2.imread(
    './kaggle_celeb_images/img_align_celeba/img_align_celeba/000002.jpg'
)
plt.imshow(test_image2);
In [45]:
#call the function to detect faces
faces = detect_faces(haar_cascade_face, test_image2)
1 faces detected
In [46]:
#convert to RGB and display image
plt.imshow(convert_rgb(faces))
plt.axis('off');
In [47]:
%%time
#loading image
test_image3 = cv2.imread(
    './graphics/golden_globes_1.png'
)

#call the function to detect faces
faces = detect_faces(haar_cascade_face, test_image3)

#convert to RGB and display image
plt.figure(figsize=(16,10))
plt.imshow(convert_rgb(faces))
plt.axis('off');
7 faces detected
CPU times: total: 1.02 s
Wall time: 234 ms
Out[47]:
(-0.5, 1370.5, 908.5, -0.5)
In [48]:
cv2.imwrite('./graphics/detected_faces.png', faces)
Out[48]:
True

Performance of Viola-Jones frontal face detector¶

  • Since we are looking for frontal faces only, all frontal faces (7 of 7) were detected, even those that were blurred in the background.
  • The detector also ignored the thumbnail at the bottom of the image.
  • Notably, this took less than one second (from the point of reading in the image to plotting out the detected frontal faces)!

Face-blurring function¶

In [49]:
# Credit: https://github.com/nithindd/aind_computer_vision/blob/master/CV_project.ipynb
def blurface(image):
    denoised_image = cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 21, 7)

    gray = cv2.cvtColor(denoised_image, cv2.COLOR_RGB2GRAY)

    # Extract the pre-trained face detector from an xml file
    face_cascade = cv2.CascadeClassifier('./opencv_data/haarcascades/haarcascade_frontalface_default.xml')

    # Detect the faces in image
    faces = face_cascade.detectMultiScale(gray, 1.1, 10)

    # Make a copy of the orginal image to blur
    final_image = np.copy(image)

    # Blur
    width = 40
    kernel = np.ones((width, width),np.float32) / 1600
    image_with_blur = cv2.filter2D(image, -1, kernel)

    for (x,y,w,h) in faces:
        padding = 30
        x_start = max(x - padding, 0)
        y_start = max(y - padding, 0)
        x_end = min(x + w + padding, image.shape[1])
        y_end = min(y + h + padding, image.shape[0])
        final_image[y_start:y_end, x_start:x_end] = cv2.filter2D(image_with_blur[y_start:y_end, x_start:x_end], -1, kernel)
    
    return final_image
In [50]:
plt.imshow(convert_rgb(blurface(test_image3)))
plt.axis('off');
plt.savefig('./graphics/blurred_faces.png')

Model 2¶

Sequential CNN model with custom dropout layer

Back to top

Load in data

In [51]:
data_dir = './data/'

img_height = 32
img_width = 32
batch_size = 20

train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=42,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=42,
  image_size=(img_height, img_width),
  batch_size=batch_size)

train_ds_class_names = train_ds.class_names
print('train_ds Classes: ', train_ds_class_names)

val_ds_class_names = val_ds.class_names
print('val_ds Classes: ', val_ds_class_names)
Found 2000 files belonging to 2 classes.
Using 1600 files for training.
Found 2000 files belonging to 2 classes.
Using 400 files for validation.
train_ds Classes:  ['lefteye', 'righteye']
val_ds Classes:  ['lefteye', 'righteye']

You can create your own layers with specific characteristics by setting up a new class for that layer object, inheriting from a superclass via super().

In [52]:
# create monte-carlo dropout class (MCDropout)
class MCDropout(keras.layers.Dropout):
    def call(self, inputs):
        return super().call(inputs, training=True)
In [53]:
# instantiate model with layers
model_cnn_mcdropout = keras.models.Sequential(
    [
        keras.layers.Conv2D(filters=8,
                            kernel_size=3,
                            activation='relu',
                            input_shape=(32, 32, 3)),
        keras.layers.Conv2D(filters=16,
                            kernel_size=3,
                            activation='relu'),
        keras.layers.Flatten(input_shape=[]),
        MCDropout(rate=0.2),
        keras.layers.Dense(256,
                           activation='relu',
                           kernel_initializer='he_normal'),
        MCDropout(rate=0.2),
        keras.layers.Dense(256,
                           activation='relu',
                           kernel_initializer='he_normal'),
        MCDropout(rate=0.2),
        keras.layers.Dense(2,
                           activation='softmax'),
    ]
)
In [54]:
# compile
model_cnn_mcdropout.compile(
    optimizer=keras.optimizers.Adam(0.001),
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics=['acc'],
)

Cross Entropy helps determine how well model fits data.

It is represented by:


$-\sum_{class=1}^{M} Observed_{class} * log(PredictedProbability_{class})$

where M is the total number of classes.

In neural networks, the last layer, which is usually a softmax layer, converts the output values into predicted probabilities for each possible class.

In [55]:
%%time
# fit
epochs = 50

history = model_cnn_mcdropout.fit(
    train_ds,
    epochs=epochs,
    validation_data=val_ds,
    verbose=1
)
Epoch 1/50
80/80 [==============================] - 3s 13ms/step - loss: 48.5973 - acc: 0.6419 - val_loss: 1.2608 - val_acc: 0.6950
Epoch 2/50
80/80 [==============================] - 1s 10ms/step - loss: 0.7460 - acc: 0.7812 - val_loss: 0.5641 - val_acc: 0.8250
Epoch 3/50
80/80 [==============================] - 1s 11ms/step - loss: 0.4024 - acc: 0.8831 - val_loss: 0.5366 - val_acc: 0.8575
Epoch 4/50
80/80 [==============================] - 1s 11ms/step - loss: 0.2746 - acc: 0.9187 - val_loss: 0.5860 - val_acc: 0.8650
Epoch 5/50
80/80 [==============================] - 1s 11ms/step - loss: 0.2291 - acc: 0.9212 - val_loss: 0.6301 - val_acc: 0.8750
Epoch 6/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1576 - acc: 0.9475 - val_loss: 0.6130 - val_acc: 0.8725
Epoch 7/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1766 - acc: 0.9444 - val_loss: 0.6402 - val_acc: 0.8325
Epoch 8/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1708 - acc: 0.9419 - val_loss: 0.6403 - val_acc: 0.8550
Epoch 9/50
80/80 [==============================] - 1s 12ms/step - loss: 0.0972 - acc: 0.9656 - val_loss: 0.6961 - val_acc: 0.8850
Epoch 10/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1123 - acc: 0.9638 - val_loss: 0.5293 - val_acc: 0.8925
Epoch 11/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0845 - acc: 0.9712 - val_loss: 0.6031 - val_acc: 0.8750
Epoch 12/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0923 - acc: 0.9700 - val_loss: 0.5084 - val_acc: 0.8750
Epoch 13/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0982 - acc: 0.9712 - val_loss: 0.4964 - val_acc: 0.8800
Epoch 14/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0696 - acc: 0.9737 - val_loss: 0.5132 - val_acc: 0.8950
Epoch 15/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0647 - acc: 0.9850 - val_loss: 0.7304 - val_acc: 0.8975
Epoch 16/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1216 - acc: 0.9688 - val_loss: 0.5898 - val_acc: 0.9025
Epoch 17/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1194 - acc: 0.9681 - val_loss: 0.5057 - val_acc: 0.9050
Epoch 18/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1174 - acc: 0.9688 - val_loss: 0.5502 - val_acc: 0.8900
Epoch 19/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1221 - acc: 0.9656 - val_loss: 0.6293 - val_acc: 0.8350
Epoch 20/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0989 - acc: 0.9731 - val_loss: 0.5575 - val_acc: 0.8875
Epoch 21/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1046 - acc: 0.9669 - val_loss: 0.7082 - val_acc: 0.8700
Epoch 22/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0844 - acc: 0.9731 - val_loss: 0.7772 - val_acc: 0.8675
Epoch 23/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0609 - acc: 0.9825 - val_loss: 0.5562 - val_acc: 0.8975
Epoch 24/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0368 - acc: 0.9887 - val_loss: 0.5517 - val_acc: 0.9175
Epoch 25/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1629 - acc: 0.9581 - val_loss: 0.6989 - val_acc: 0.8575
Epoch 26/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1525 - acc: 0.9656 - val_loss: 0.5893 - val_acc: 0.8925
Epoch 27/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0702 - acc: 0.9800 - val_loss: 0.7342 - val_acc: 0.9125
Epoch 28/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0291 - acc: 0.9900 - val_loss: 0.6845 - val_acc: 0.9075
Epoch 29/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0192 - acc: 0.9956 - val_loss: 0.5188 - val_acc: 0.9200
Epoch 30/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0122 - acc: 0.9950 - val_loss: 0.7115 - val_acc: 0.9050
Epoch 31/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0424 - acc: 0.9875 - val_loss: 0.8162 - val_acc: 0.8675
Epoch 32/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0542 - acc: 0.9850 - val_loss: 0.5119 - val_acc: 0.9275
Epoch 33/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0290 - acc: 0.9900 - val_loss: 0.6000 - val_acc: 0.9150
Epoch 34/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0637 - acc: 0.9875 - val_loss: 0.5677 - val_acc: 0.9425
Epoch 35/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0630 - acc: 0.9869 - val_loss: 0.6466 - val_acc: 0.9100
Epoch 36/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0328 - acc: 0.9912 - val_loss: 0.4655 - val_acc: 0.9175
Epoch 37/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0186 - acc: 0.9937 - val_loss: 0.6348 - val_acc: 0.9275
Epoch 38/50
80/80 [==============================] - 1s 12ms/step - loss: 0.0557 - acc: 0.9869 - val_loss: 0.6479 - val_acc: 0.9125
Epoch 39/50
80/80 [==============================] - 1s 12ms/step - loss: 0.1798 - acc: 0.9525 - val_loss: 0.5913 - val_acc: 0.8775
Epoch 40/50
80/80 [==============================] - 1s 12ms/step - loss: 0.1149 - acc: 0.9681 - val_loss: 0.8366 - val_acc: 0.8900
Epoch 41/50
80/80 [==============================] - 1s 12ms/step - loss: 0.0423 - acc: 0.9862 - val_loss: 0.8819 - val_acc: 0.9100
Epoch 42/50
80/80 [==============================] - 1s 12ms/step - loss: 0.0846 - acc: 0.9837 - val_loss: 1.0646 - val_acc: 0.8525
Epoch 43/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0318 - acc: 0.9900 - val_loss: 0.6441 - val_acc: 0.8950
Epoch 44/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0380 - acc: 0.9869 - val_loss: 1.1316 - val_acc: 0.8975
Epoch 45/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0224 - acc: 0.9937 - val_loss: 0.6188 - val_acc: 0.9025
Epoch 46/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0250 - acc: 0.9925 - val_loss: 0.4991 - val_acc: 0.9250
Epoch 47/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0172 - acc: 0.9969 - val_loss: 0.6428 - val_acc: 0.9250
Epoch 48/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0475 - acc: 0.9900 - val_loss: 1.0592 - val_acc: 0.8700
Epoch 49/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0711 - acc: 0.9869 - val_loss: 1.2203 - val_acc: 0.8425
Epoch 50/50
80/80 [==============================] - 1s 11ms/step - loss: 0.1007 - acc: 0.9800 - val_loss: 0.9689 - val_acc: 0.8675
CPU times: total: 2min 41s
Wall time: 49.1 s

GPU (~50s)

In [56]:
# save model
model_cnn_mcdropout.save('./models/cnn_mcdropout')
INFO:tensorflow:Assets written to: ./models/cnn_mcdropout\assets
In [57]:
history.history
Out[57]:
{'loss': [48.59730911254883,
  0.7460277676582336,
  0.4023975729942322,
  0.2746173143386841,
  0.22910696268081665,
  0.157640278339386,
  0.17655536532402039,
  0.17081676423549652,
  0.09716537594795227,
  0.11231448501348495,
  0.08453074097633362,
  0.09234211593866348,
  0.09816806018352509,
  0.06961740553379059,
  0.06466124206781387,
  0.12158027291297913,
  0.1193663477897644,
  0.11738354712724686,
  0.12211340665817261,
  0.09885992109775543,
  0.1046118512749672,
  0.08440323173999786,
  0.06089307367801666,
  0.03677494078874588,
  0.16287995874881744,
  0.1525363028049469,
  0.07021334767341614,
  0.02909652516245842,
  0.019185511395335197,
  0.012161515653133392,
  0.0423540435731411,
  0.054193541407585144,
  0.02901754342019558,
  0.0637383833527565,
  0.06301436573266983,
  0.032772310078144073,
  0.018596326932311058,
  0.05567879229784012,
  0.1798142045736313,
  0.11493930220603943,
  0.04227212816476822,
  0.08458048850297928,
  0.03178039565682411,
  0.0379747673869133,
  0.022395286709070206,
  0.025044577196240425,
  0.017231645062565804,
  0.04747198149561882,
  0.07105182856321335,
  0.10073041915893555],
 'acc': [0.6418750286102295,
  0.78125,
  0.8831250071525574,
  0.918749988079071,
  0.9212499856948853,
  0.9474999904632568,
  0.9443749785423279,
  0.9418749809265137,
  0.965624988079071,
  0.9637500047683716,
  0.9712499976158142,
  0.9700000286102295,
  0.9712499976158142,
  0.9737499952316284,
  0.9850000143051147,
  0.96875,
  0.9681249856948853,
  0.96875,
  0.965624988079071,
  0.9731249809265137,
  0.9668750166893005,
  0.9731249809265137,
  0.9825000166893005,
  0.9887499809265137,
  0.9581249952316284,
  0.965624988079071,
  0.9800000190734863,
  0.9900000095367432,
  0.9956250190734863,
  0.9950000047683716,
  0.987500011920929,
  0.9850000143051147,
  0.9900000095367432,
  0.987500011920929,
  0.9868749976158142,
  0.9912499785423279,
  0.9937499761581421,
  0.9868749976158142,
  0.9524999856948853,
  0.9681249856948853,
  0.9862499833106995,
  0.9837499856948853,
  0.9900000095367432,
  0.9868749976158142,
  0.9937499761581421,
  0.9925000071525574,
  0.996874988079071,
  0.9900000095367432,
  0.9868749976158142,
  0.9800000190734863],
 'val_loss': [1.2607591152191162,
  0.5641356110572815,
  0.536637544631958,
  0.5859897136688232,
  0.6301290392875671,
  0.6130388975143433,
  0.6401651501655579,
  0.6402987837791443,
  0.696103572845459,
  0.529287576675415,
  0.6030967235565186,
  0.5084162354469299,
  0.49638915061950684,
  0.5132113099098206,
  0.7304136753082275,
  0.589782178401947,
  0.5057395100593567,
  0.5502204298973083,
  0.6292506456375122,
  0.5574504733085632,
  0.7082326412200928,
  0.7771573662757874,
  0.5561748147010803,
  0.5517317056655884,
  0.6989285349845886,
  0.589296281337738,
  0.7342371940612793,
  0.6844755411148071,
  0.518820583820343,
  0.7115486860275269,
  0.8161992430686951,
  0.5118547081947327,
  0.599992036819458,
  0.5676887035369873,
  0.646593451499939,
  0.46552062034606934,
  0.6347538232803345,
  0.6479085683822632,
  0.5912843942642212,
  0.8365575671195984,
  0.8819077014923096,
  1.0645736455917358,
  0.6441012620925903,
  1.131626009941101,
  0.6187908053398132,
  0.4991234838962555,
  0.6428235769271851,
  1.0591810941696167,
  1.2202801704406738,
  0.9688865542411804],
 'val_acc': [0.6949999928474426,
  0.824999988079071,
  0.8575000166893005,
  0.8650000095367432,
  0.875,
  0.8725000023841858,
  0.8324999809265137,
  0.8550000190734863,
  0.8849999904632568,
  0.8924999833106995,
  0.875,
  0.875,
  0.8799999952316284,
  0.8949999809265137,
  0.8974999785423279,
  0.9024999737739563,
  0.9049999713897705,
  0.8899999856948853,
  0.8349999785423279,
  0.887499988079071,
  0.8700000047683716,
  0.8675000071525574,
  0.8974999785423279,
  0.9175000190734863,
  0.8575000166893005,
  0.8924999833106995,
  0.9125000238418579,
  0.9075000286102295,
  0.9200000166893005,
  0.9049999713897705,
  0.8675000071525574,
  0.9275000095367432,
  0.9150000214576721,
  0.9424999952316284,
  0.9100000262260437,
  0.9175000190734863,
  0.9275000095367432,
  0.9125000238418579,
  0.8774999976158142,
  0.8899999856948853,
  0.9100000262260437,
  0.8525000214576721,
  0.8949999809265137,
  0.8974999785423279,
  0.9024999737739563,
  0.925000011920929,
  0.925000011920929,
  0.8700000047683716,
  0.8424999713897705,
  0.8675000071525574]}
In [58]:
# summarize history for accuracy
plt.figure(figsize=(12,8))
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model_cnn_mcdropout accuracy', fontsize=18)
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.tight_layout()
plt.savefig('./graphics/model_cnn_acc.png')
In [59]:
# summarize history for loss
plt.figure(figsize=(12,8))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model_cnn_mcdropout loss', fontsize=18)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/model_cnn_loss.png')
In [60]:
model_cnn_mcdropout.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 8)         224       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 16)        1168      
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
mc_dropout (MCDropout)       (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 256)               3211520   
_________________________________________________________________
mc_dropout_1 (MCDropout)     (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
mc_dropout_2 (MCDropout)     (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 514       
=================================================================
Total params: 3,279,218
Trainable params: 3,279,218
Non-trainable params: 0
_________________________________________________________________

Model 3¶

Back to top

Pre-trained VGG16 model

In [61]:
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
# load the model
model_vgg16 = VGG16()
# load an image from file
plt.imshow(convert_rgb(cv2.imread('./data/lefteye/lefteye_000001.jpg')))
# load in version with target size set to suit VGG16
image = load_img('./data/lefteye/lefteye_000001.jpg',
                 target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# predict the probability across all output classes
yhat = model_vgg16.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))
shower_curtain (17.55%)

The "left" eye (or any eye) is not recognized by the pretrained model.

This is only to be expected since VGG16 was not trained to recognize facial features.

The way to resolve this is to perform transfer learning, where the VGG16 model is trained to take in additional classes.

In [62]:
model_vgg16.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

Transfer Learning with VGG16¶

In execution of the custom model, we effectively add the pretrained base model layers at the start of the custom model (tf.keras.models.Sequential()).

What we are trying to achieve is get a well-trained model to perform feature extraction. However, we do not really care about the classes VGG16 was originally trained on (over 1000 classes from well over 10 million images).

Hence, we set:

  • include_top = False and -model.trainable = False.
In [63]:
pretrained_base = VGG16(
    weights='imagenet', include_top=False, input_shape=(32,32,3)
)
pretrained_base.trainable = False
In [64]:
model_vgg16_custom = keras.models.Sequential(
    [
        pretrained_base,
        # add global average pooling layer to "soften" computational intensity for model
        keras.layers.GlobalAveragePooling2D(),
        keras.layers.Flatten(),
        keras.layers.Dense(256, activation=tf.nn.relu, name="dense1"),
        keras.layers.Dense(10, activation=tf.nn.relu, name="dense2"),
        keras.layers.Dense(2, activation=tf.nn.softmax)
    ]
)
In [65]:
# compile
model_vgg16_custom.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss='sparse_categorical_crossentropy',
    metrics=['acc']
)
In [66]:
%%time
# fit
epochs = 50
history_vgg16 = model_vgg16_custom.fit(train_ds,
                                       validation_data=val_ds,
                                       epochs=epochs,
                                       verbose=1)
Epoch 1/50
80/80 [==============================] - 3s 21ms/step - loss: 1.4765 - acc: 0.5100 - val_loss: 0.6965 - val_acc: 0.4500
Epoch 2/50
80/80 [==============================] - 1s 17ms/step - loss: 0.6876 - acc: 0.5144 - val_loss: 0.7077 - val_acc: 0.4500
Epoch 3/50
80/80 [==============================] - 1s 17ms/step - loss: 0.6728 - acc: 0.5581 - val_loss: 0.6859 - val_acc: 0.6100
Epoch 4/50
80/80 [==============================] - 1s 17ms/step - loss: 0.6251 - acc: 0.6619 - val_loss: 0.6668 - val_acc: 0.7000
Epoch 5/50
80/80 [==============================] - 1s 17ms/step - loss: 0.6129 - acc: 0.6725 - val_loss: 0.6595 - val_acc: 0.6575
Epoch 6/50
80/80 [==============================] - 1s 17ms/step - loss: 0.5434 - acc: 0.7531 - val_loss: 0.6429 - val_acc: 0.7200
Epoch 7/50
80/80 [==============================] - 1s 16ms/step - loss: 0.4709 - acc: 0.8188 - val_loss: 0.6574 - val_acc: 0.7175
Epoch 8/50
80/80 [==============================] - 1s 17ms/step - loss: 0.4397 - acc: 0.8331 - val_loss: 0.7203 - val_acc: 0.6975
Epoch 9/50
80/80 [==============================] - 1s 17ms/step - loss: 0.3785 - acc: 0.8594 - val_loss: 0.7965 - val_acc: 0.7250
Epoch 10/50
80/80 [==============================] - 1s 16ms/step - loss: 0.4289 - acc: 0.8181 - val_loss: 0.9504 - val_acc: 0.6850
Epoch 11/50
80/80 [==============================] - 1s 17ms/step - loss: 0.3449 - acc: 0.8725 - val_loss: 0.8182 - val_acc: 0.7075
Epoch 12/50
80/80 [==============================] - 1s 17ms/step - loss: 0.3032 - acc: 0.8900 - val_loss: 0.8576 - val_acc: 0.7325
Epoch 13/50
80/80 [==============================] - 1s 17ms/step - loss: 0.2780 - acc: 0.8950 - val_loss: 0.9569 - val_acc: 0.7375
Epoch 14/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1970 - acc: 0.9275 - val_loss: 0.8951 - val_acc: 0.7325
Epoch 15/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1696 - acc: 0.9369 - val_loss: 1.0397 - val_acc: 0.7500
Epoch 16/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1957 - acc: 0.9281 - val_loss: 1.0396 - val_acc: 0.7250
Epoch 17/50
80/80 [==============================] - 1s 16ms/step - loss: 0.1819 - acc: 0.9294 - val_loss: 0.9802 - val_acc: 0.7200
Epoch 18/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1727 - acc: 0.9306 - val_loss: 1.0162 - val_acc: 0.7250
Epoch 19/50
80/80 [==============================] - 1s 16ms/step - loss: 0.1089 - acc: 0.9550 - val_loss: 1.2267 - val_acc: 0.7350
Epoch 20/50
80/80 [==============================] - 1s 16ms/step - loss: 0.1104 - acc: 0.9556 - val_loss: 1.3025 - val_acc: 0.7375
Epoch 21/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0948 - acc: 0.9712 - val_loss: 1.4328 - val_acc: 0.7200
Epoch 22/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0752 - acc: 0.9706 - val_loss: 1.6057 - val_acc: 0.6875
Epoch 23/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1452 - acc: 0.9400 - val_loss: 1.4922 - val_acc: 0.7200
Epoch 24/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1052 - acc: 0.9613 - val_loss: 1.5552 - val_acc: 0.7350
Epoch 25/50
80/80 [==============================] - 1s 17ms/step - loss: 0.1236 - acc: 0.9563 - val_loss: 1.2826 - val_acc: 0.7250
Epoch 26/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0852 - acc: 0.9625 - val_loss: 1.3487 - val_acc: 0.7450
Epoch 27/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0617 - acc: 0.9675 - val_loss: 1.4334 - val_acc: 0.7525
Epoch 28/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0389 - acc: 0.9837 - val_loss: 1.5800 - val_acc: 0.7250
Epoch 29/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0316 - acc: 0.9869 - val_loss: 1.5237 - val_acc: 0.7275
Epoch 30/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0535 - acc: 0.9769 - val_loss: 1.5089 - val_acc: 0.7450
Epoch 31/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0722 - acc: 0.9669 - val_loss: 1.2805 - val_acc: 0.6800
Epoch 32/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0637 - acc: 0.9762 - val_loss: 1.6969 - val_acc: 0.7175
Epoch 33/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0358 - acc: 0.9850 - val_loss: 1.7322 - val_acc: 0.7425
Epoch 34/50
80/80 [==============================] - 1s 16ms/step - loss: 0.1140 - acc: 0.9588 - val_loss: 1.5574 - val_acc: 0.7100
Epoch 35/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0785 - acc: 0.9706 - val_loss: 1.5992 - val_acc: 0.7350
Epoch 36/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0586 - acc: 0.9769 - val_loss: 1.5155 - val_acc: 0.7250
Epoch 37/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0379 - acc: 0.9800 - val_loss: 1.6843 - val_acc: 0.7225
Epoch 38/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0458 - acc: 0.9781 - val_loss: 1.7057 - val_acc: 0.7550
Epoch 39/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0205 - acc: 0.9894 - val_loss: 1.6682 - val_acc: 0.7325
Epoch 40/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0171 - acc: 0.9912 - val_loss: 1.6130 - val_acc: 0.7600
Epoch 41/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0132 - acc: 0.9919 - val_loss: 1.6507 - val_acc: 0.7425
Epoch 42/50
80/80 [==============================] - 1s 16ms/step - loss: 0.0123 - acc: 0.9919 - val_loss: 1.7524 - val_acc: 0.7425
Epoch 43/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0109 - acc: 0.9919 - val_loss: 1.7674 - val_acc: 0.7500
Epoch 44/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0112 - acc: 0.9919 - val_loss: 1.8062 - val_acc: 0.7450
Epoch 45/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0112 - acc: 0.9919 - val_loss: 1.8104 - val_acc: 0.7450
Epoch 46/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0100 - acc: 0.9919 - val_loss: 1.8506 - val_acc: 0.7450
Epoch 47/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0118 - acc: 0.9919 - val_loss: 1.7947 - val_acc: 0.7375
Epoch 48/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0107 - acc: 0.9919 - val_loss: 1.9970 - val_acc: 0.7250
Epoch 49/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0110 - acc: 0.9919 - val_loss: 1.9097 - val_acc: 0.7450
Epoch 50/50
80/80 [==============================] - 1s 17ms/step - loss: 0.0356 - acc: 0.9856 - val_loss: 2.0097 - val_acc: 0.7375
CPU times: total: 2min 55s
Wall time: 1min 9s

CPU (5min); GPU (1min)

In [67]:
model_vgg16_custom.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 1, 1, 512)         14714688  
_________________________________________________________________
global_average_pooling2d (Gl (None, 512)               0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense1 (Dense)               (None, 256)               131328    
_________________________________________________________________
dense2 (Dense)               (None, 10)                2570      
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 22        
=================================================================
Total params: 14,848,608
Trainable params: 133,920
Non-trainable params: 14,714,688
_________________________________________________________________
In [68]:
model_vgg16_custom.save('./models/vgg_16_custom')
INFO:tensorflow:Assets written to: ./models/vgg_16_custom\assets
In [69]:
# summarize history for accuracy
plt.figure(figsize=(16,10))
plt.plot(history_vgg16.history['acc'])
plt.plot(history_vgg16.history['val_acc'])
plt.title('model_vgg16_custom accuracy', fontsize=18)
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/vgg_16_custom_acc.png')
In [70]:
# summarize history for loss
plt.figure(figsize=(16,10))
plt.plot(history_vgg16.history['loss'])
plt.plot(history_vgg16.history['val_loss'])
plt.title('model_vgg16_custom loss', fontsize=18)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/vgg_16_custom_loss.png')

We see diverging loss plots, indicative of some issues in the modelling.

Future Development¶

We are able to stitch together many possibilities for experimentation based on the Kaggle dataset.

Next time, we may wish to build a classifier for whether a person is smiling or not based on their eye features.

Data in the attributes table includes whether the person in the image is smiling (1) or not (-1).

In [71]:
# Create smile dataframe
df_smile = df_celeb_attributes[['image_id', 'Smiling']].truncate(after=999)

# Lower-case columns
df_smile.columns = map(str.lower, df_smile.columns)

# Check df & truncation
df_smile.tail(3)
Out[71]:
image_id smiling
997 000998.jpg -1
998 000999.jpg -1
999 001000.jpg 1

Show images with smiling classes (1 for smiling, -1 for not smiling)

In [72]:
display_count = 15
ncols_display = 5

plt.figure(figsize=(16,12))

for i in range(1, display_count + 1):
    smile_class = df_smile['smiling'].loc[i - 1]
    ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
    ax.imshow(convert_rgb(cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')))
    ax.set_title(f'Image {i}, Smile : {smile_class}')
    ax.axis('off')
plt.suptitle(f'Faces: First {display_count} images', y=1, fontsize=18);
plt.tight_layout(pad=3)
# plt.savefig('./graphics/face_15.png')

faces

Bibliography¶

Back to top

  • Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. What Makes Paris Look like Paris? ACM Transactions on Graphics (SIGGRAPH 2012), August 2012, vol. 31, No. 3.
  • S. Yang, P. Luo, C. C. Loy, and X. Tang, "From Facial Parts Responses to Face Detection: A Deep Learning Approach", in IEEE International Conference on Computer Vision (ICCV), 2015
  • Viola, P. & Jones, M. (2001). "Robust Real-time Object Detection". Second International Workshop on Statistical and Computational Theories of Vision -- Modeling, Learning, Computing, and Sampling. Vancouver, Canada, July 13, 2001.

Other links:

  • On CNN: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D
    tf.keras.layers.Conv2D(
      filters, kernel_size, strides=(1, 1), padding='valid',
      data_format=None, dilation_rate=(1, 1), groups=1, activation=None,
      use_bias=True, kernel_initializer='glorot_uniform',
      bias_initializer='zeros', kernel_regularizer=None,
      bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
      bias_constraint=None, **kwargs
    )
    
  • On why use dropout between layers: https://arxiv.org/abs/1506.02142
  • https://www.tensorflow.org/tutorials/load_data/images
  • https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
  • https://www.analyticsvidhya.com/blog/2020/11/tutorial-how-to-visualize-feature-maps-directly-from-cnn-layers/
  • https://stackoverflow.com/questions/49295311/what-is-the-difference-between-flatten-and-globalaveragepooling2d-in-keras

Other cool things to check out¶

  • https://github.com/tensorflow/tensor2tensor
  • https://pyimagesearch.com/2020/06/22/turning-any-cnn-image-classifier-into-an-object-detector-with-keras-tensorflow-and-opencv/
  • https://towardsdatascience.com/celebrity-face-generation-with-deep-convolutional-gans-40b96147a1c9
  • to build helper functions
  • how to extract feature maps for each model

Acknowledgements¶

A shout out to thank my instructional team at General Assembly, as well as the team at large for facilitating my learning journey. It's been fun learning with my coursemates and the GA community globally; thanks for the good times! Also, Josh Starmer at StatQuest has been a boon to society. :)

Thanks also to IMDA for their steadfast commitment to lifelong learning of digital skills and sponsorship of programmes like the Tech Immersion and Placement Programme (TIPP).

Special thanks to old buddies and new friends I've made in the process of reaching out via professional networks / through social circles. If you're reading this, you know who you are! :)

:bowtie: :beer: :pizza: :sparkling_heart: :muscle: :clap: :tada: